Atypicity detection in data streams: A self-adjusting approach

نویسندگان

  • Alice Marascu
  • Florent Masseglia
چکیده

Outlyingness is a subjective concept relying on the isolation level of a (set of) record(s). Clustering-based outlier detection is a field that aims to cluster data and to detect outliers depending on their characteristics (i.e. small, tight and/or dense clusters might be considered as outliers). Existing methods require a parameter standing for the “level of outlyingness”, such as the maximum size or a percentage of small clusters, in order to build the set of outliers. Unfortunately, manually setting this parameter in a streaming environment should not be possible, given the fast time response usually needed. In this paper we propose WOD, a method that separates outliers from clusters thanks to a natural and effective principle. The main advantages of WOD are its ability to automatically adjust to any clustering result and to be parameterless.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multi-resolution Approach for Atypical Behaviour Mining

Atypical behaviours are the basis of a valuable knowledge in domains related to security (e.g. fraud detection for credit card [1], cyber security [4] or safety of critical systems [6]). Atypicity generally depends on the isolation level of a (set of) records, compared to the dataset. One possible method for finding atypic records aims to perform two steps. The first step is a clustering (group...

متن کامل

Extraction de motifs séquentiels dans les flux de données. (Sequential patterns mining from data streams)

In recent years, many applications dealing with data generated continuously and at high speeds have emerged. These data are now quali ed as data streams. Dealing with potentially in nite quantities of data imposes constraints that raise many processing problems. As an example of such constraints we have the inability to block the data stream as well as the need to produce results in real time. ...

متن کامل

Developing A Fault Diagnosis Approach Based On Artificial Neural Network And Self Organization Map For Occurred ADSL Faults

Telecommunication companies have received a great deal of research attention, which have many advantages such as low cost, higher qualification, simple installation and maintenance, and high reliability. However, the using of technical maintenance approaches in Telecommunication companies could improve system reliability and users' satisfaction from Asymmetric digital subscriber line (ADSL) ser...

متن کامل

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

Target Detection Improvements in Hyperspectral Images by Adjusting Band Weights and Identifying end-members in Feature Space Clusters

          Spectral target detection could be regarded as one of the strategic applications of hyperspectral data analysis. The presence of targets in an area smaller than a pixel’s ground coverage has led to the development of spectral un-mixing methods to detect these types of targets. Usually, in the spectral un-mixing algorithms, the similar weights have been assumed for spectral bands. Howe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Intell. Data Anal.

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2011